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ANNUAL  REPORT  ON 


MIPR  NO.  93MM3513 
for  the  period 

January  11, 1993  -  January  11, 1994 


I.  INTRODUCTION  AND  BACKGROUND 

The  objectives  of  the  above  project  were  formulated  in  discussion  with  Mr.  Henry 
Gardner  of  U.S.  Army  Medical  R&D  Command,  Ft.  Detrick,  Maryland.  The  project 
purpose  and  workscope  was  stated  in  the  proposal  as  follows:  to  perform  mathematical, 
statistical  and  risk-analytical  work  in  support  of  the  mission  of  the  Army  Biomedical 
Research  and  Development  Laboratory  (ABRDL).  The  project  continues  and  extends 
work  performed  under  MIPR  No.  91MM1598. 

II.  APPROACHES  TAKEN  AND  PROGRESS 
Work  has  been  initiated  in  three  areas: 

A.  A  SIMULATION  STUDY  OF  THE  BEHAVIOR  OF  ESTIMATORS  OF  THE 
TERATOGENIC  INDEX. 

Appendix  A  contains  a  simulation  study  of  the  behavior  of  estimators  of  the 
teratogenic  index.  Approximations  for  the  variance  of  the  teratogenic  index  and  the 
logarithm  of  the  index  are  given.  Simulation  is  used  to  study  the  behavior  of  using  these 
approximations  to  obtain  approximate  standard  errors  and  confidence  intervals  for  the 
index.  The  simulation  results  suggest  that  the  sampling  distribution  of  the  estimator  of 
teratogenic  index  is  not  symmetric  but  the  sampling  distribution  of  the  logarithm  of  the 
estimator  is  more  symmetric.  Confidence  intervals  based  on  the  normal  distribution 
may  not  have  the  advertised  coverage  if  the  sampling  distribution  of  the  statistic  is  not 
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symmetric.  The  simulation  results  indicate  that  the  coverage  of  the  confidence  intervals 
is  reasonable,  particularly  those  based  on  the  logarithm  of  the  index. 

B.  DESIGN  OF  EXPERIMENTS  AND  ANALYSIS  OF  THE  EXPERIMENTAL  DATA. 

Here  is  an  experimental  protocol  that  may  well  be  of  considerable  usefulness  in 
practice.  It  is  basically  the  same  as  that  currently  used,  but  recommends  an  attitude  of 
caution,  suspicion,  explicitness-of-purpose  and  search  for  explanations  of  what  is 
observed  that  may  verge  on  the  paranoid.  The  purpose  of  such  an  approach  is  to 
understand  and  quantify  the  sources  of  variability  in  the  experiment.  Dr.  Twerdok's 
establishment  of  a  data  base  to  monitor  the  health  of  the  medaka  will  aid  in 
understanding  the  natural  variability  of  the  population. 

(a)  Choose  a  number  of  experimental  animals  (e.g.  medaka  fish  -  the  main 
consideration  --  or  rodents)  and  subject  them  to  specified  environmental  and  dosage 
conditions.  Identify  those  in  “tank"  /,  i  =  1, 2, ...,  I;  put  riiitk)  originally  therein:  tk  refers 
to  a  time  of  ultimate  sacrifice.  It  will  be  highly  desirable  to  keep  track  of  happenings  in 
tank  i  as  carefully  as  possible,  e.g.  recording  temperature  measurements,  PH,  etc.,  - 
even  number  of  fish  that  die.  The  initial  fish  complements  of  the  tanks  should  be 
randomized.  Any  extra  information  about  both  individual  fish  or  the  respective  tanks  is 
worth  having,  as  initial  variations  of  same  may  influence  the  later  biological  experiences 
of  the  fish.  Both  mean  and  variance  of  measurements  could  be  dose-affected. 

(b)  Fish  treatment  of  interest  may  involve  subjecting  them  to  a  steady  concentration  of  a 
chemical  (DEN,  perhaps,  or  in  combination  with  other  affectors)  over  a  period  of  time.  At 
the  end  of  the  exposure  of  the  fish  they  will  be  removed  and  examined.  Let  nicitk)  be  the 
number  of  fish  in  tank  i  that  have  received  chemical  dosage  c  constantly  over  time  tk- 
Note  that  the  dosage  pattern  need  not  be  as  simple  as  described;  the  subscript  c  simply 
designates  the  dosage  type  administered  for  the  time  tk-  The  dosage  may  be  time- 
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varying  (bolus,  bolus  plus  constant,  constant  for  time,  nothing  thereafter, 
etc.) ...  whatever  is  biologically  interesting  or  meaningful. 

(c)  Control  or  reference  treatments  are  worth  having,  and  often  essential  if  we  want  to 
study  a  more  operational  (groundwater  concentration  levels)  situation.  Unfortunately, 
these  must  be  carried  out  in  separate  tanks,  since  the  chemical  is  in  solution.  In  spite  of 
the  exercise  of  great  care  there  can  be  between-tanks  differences  (over  and  above 
dosage).  Consequently,  replication  of  tank  experience  is  highly  advisable,  indeed 
essential,  in  order  to  be  able  to  estimate  between-tank  contribution  to  variation  in 
endpoints  of  direct  biological  concern. 

Appendix  B  reports  the  analysis  of  some  data  from  a  bioassay  study.  A  procedure  to 
assess  the  variability  between  tanks  in  the  same  treatment  groups  is  proposed.  The 
procedure  is  then  used  to  assess  the  variability  between  tanks.  The  analysis  suggests 
that  there  is  a  tank  effect  within  treatments. 

We  will  investigate  (by  mathematics  and  perhaps  computer  simulation)  the  effect  of 
number  of  replications,  numbers  of  original  subjects,  dosages,  and  sacrifice  times,  plus 
the  various  endpoint  observations  that  may  be  informative,  not  to  mention  the  influence 
of  the  types  of  parametric  dose-response  models  used  to  summarize  the  data  and  the 
ways  those  models  can  be  fitted,  and  the  fit  quality  and  informativeness.  These  studies 
will  contribute  to  the  scientific  understanding  needed  to  minimize  the  number  of 
animals  used  in  experiments. 

B.l.  Anticipatory  Dose-Response  Predictor  Methodology 

Suppose  we  want  to  infer  the  effect  of  a  dosage  level  of  some  agent,  say  TCE,  on  an 
endpoint  of  interest,  e.g.  occurrence  of  (pre)cancerous  foci.  The  delay  in  getting  any  foci 
at  small  doses  suggests  search  for  a  precursor.  One  possibility  is  a  proliferative  index  (PI) 
change  or  level  obtained  by  a  staining  technique:  roughly  speaking  the  technique 
identifies  the  fraction  of  cells,  in  a  replicative  stage  at  the  sampling  time.  The  argument 
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is  that  there  may  be  an  exploitable  relationship  between  (some  form  of)  PI  level, 
observable  relatively  soon,  and  later  appearance  of  foci. 

Such  an  anticipatory  strategy  is  promising  and  could  well  be  highly  profitable  for 
risk  analyses.  We  wish  to  establish  credible  analytical  tools  for  handling  such  data. 

These  tools  may  well  be  of  use  more  extensively,  to  the  benefit  of  risk  analysts. 

B. 2.  Models 

There  are  a  number  of  models  that  may  credibly  connect  PI  and  F  (focii  prevalence) 
data.  Here  is  one  such  that  is  essentially  off-the-shelf  and  could  be  useful. 

Logistic 

Assume  that  there  is  a  regression-like  relation  between 
PI  (tk)  and  fi/m 

Proliferative  Index  at  Fraction  of  Subjects 

Sacrifice  Time  tk  Exhibiting  Response 

]^-l  ^  (=  Foci)  at  Sacrifice  Time  ti 

^  =  1,2,...,/ 

where  ti  is  later  than 

The  basic  model  is  that  fi/rii  may  be  approximately  of  the  form 

exp^a  +  bi  PI(t|)  +  &2  ^1(^2)  •  •  •  + 

1  +  exp^fl  +  bi  Pl(ti  )  +  b2L*l{t2)  +  ---  +  b-^'Pl{t^^  +  ctij 

■where  u  and  bi,  bz/  •••,  b'^  and  c  are  unknown  constants  that  can  be  estimated  by 
maximum  likelihood.  Here  ^^exp^^  stands  for  exponential.  This  is  a  standard  form  for 
predicting  probabilities,  available  in  many  statistical  packages,  such  as  BMDP  and 
probably  SAS,  etc. 

C.  TOWARDS  DECISION-ORIENTED  SCORING  OF  TOXIC-WASTE 
REPOSITORY/SITE  CLEANUP. 

The  U.S.  continental  area  is  dotted  with  a  number  of  landsites  that  have  been 
dedicated  to  the  containment  of  toxic  wastes;  so-called  toxic  waste  repositories  (TWRs). 
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Several  to  many  of  these  sites  are  located  on  military  bases,  e.g.,  at  Aberdeen  Proving 
Ground  and  at  Rocky  Mountain  Arsenal  but  elsewhere  as  well.  Such  sites  contain  large 
amounts  of  various  seriously  toxic  elements  that  have  been  entering  ground  water  and 
appearing  as  effluent  from  the  sites.  Other  environmental  components  such  as  soil  may 
well  be  affected;  contaminated  soil  can  be  dispersed  by  wind  and  rain.  Owing  in  part  to 
the  planned  reduction  of  overall  U.S.  military  investment,  including  requirement  for 
land  for  weapons  testing,  but  also  to  a  growing  appreciation  of  environmental  threat, 
certain  areas  containing  military  (and  civilian)  TWRs  are  to  be  closed  to  further  input 
and  cleaned  up  so  as  to  reduce  hazards  to  humans  in  the  environment;  also  of  concern 
is  the  broader  environment  as  well:  its  vegetation  and  wildlife.  Note  that  clean  up  of 
such  sites  that  are  to  be  cleaned  may  not  be  limited  to  those  that  are  to  be  totally  closed, 
the  human  environmental  impact  of  sites  that  remain  operational  will  remain  of 
concern. 

Cl.  The  Problem 

To  clean  a  site  means  here  to  reduce  its  undesirable  impact  to  a  tolerable  or 
inoffensive  level,  appropriately  defined.  At  least  one  important  component  of  such 
iji\pact  is  measured  by  a  collection  of  potentiully  huzui'd-reluted  chemicul  constituents  of  the 
o-roundwater  and  effluents  associated  with  the  sites.  Excessive  presence  of  the  above 

O 

chemicals  is  believed  to  be  threatening  to  human  life  and  the  environment.  Thus  clean 
up  is  aimed  at  reducing  the  concentration  of  such  items  in  ground  water  and  effluent  to 
a  tolerable  level. 

Clean  up  is  to  be  accomplished  in  a  cost-effective  manner.  Note  that  we  use  water 
clean  up  as  an  example  and  would  not  limit  attention  to  it  alone  if  other  elements  such 
as  air  quality,  surface  soil  condition,  etc.,  are  abnormal  and  are  options  for  improvement 
or  "clean  up,"  as  will  very  likely  be  the  case. 

There  are  various  problems  of  detail  that  confront  a  decision-maker  who  deals  with 
clean  up.  We  review  these  as  they  are  now  understood. 
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Some  specific  problems  are  these. 

(1)  The  corrterits,  and  pollution  potential,  of  each  TWR  are  only  vaguely  known. 

The  potential  for  environmental  damage  may  be  related  to  complex  combinational  or 
serendipidous  behavior  of  many  chemicals. 

(2)  The  concentration  of  the  various  potentially  hazardous  components  in 
groundwater  and  effluent  are  likely  to  vary  haphazardly  (  randomly^ )  over  time  due  to 
local  conditions  such  as  rainfall  and  general  groundwater  level,  but  also  because  of  the 
rate  of  decomposition  of  materials  and  their  containments  in  the  ground,  chemical 
reactions,  etc.  This  noisy  background  helps  to  obscure  desired  signals  of  a  concentration 
decrease  resulting  from  clean  up  effort.  In  fact,  it  has  been  remarked  by  Travis  (1992) 
that  pumping  to  remediate  groundwater  in  contaminated  aquifers  may  be  only 
apparently,  and  actually  just  temporarily,  effective.  The  reason  is  that  while  pumping 
may  reduce  concentrations,  dense  NAPLs  are  nearly  permanently  in  place  at  aquifer 
bottoms,  where  their  dissipation  is  by  slow  molecular  diffusion.  Observed  contaminant 
concentration  reduction  while  pumping  is  largely  the  effect  of  dilution;  concentrations 
have  been  observed  to  go  back  up  once  pumping  stops.  If  soil  is  contaminated  for  a 
long  time  with  hydrophobic  chemicals  the  same  effect  prevails. 

(3)  The  relative  importance  of  clean  up  of  the  various  items  (chemicals)  may  be 
unclear;  some  may  require  more  attention  than  others.  In  fact  "clean  up  needs  to  be 
defined  in  a  way  that  is  agreed  upon,  scientifically  supportable  (or  not  wholly  specious), 
cost-effective  and  practical,  and  communicable. 

(4)  Chemical  indicators  alone  may  well  not  portray  hazard  adequately,  particularly 
as  these  conspire  to  affect  complex  biological  organisms  such  as  mankind,  wildlife, 
vegetation,  etc.  For  this  reason,  testing  for  clean  up  with  actual  organisms,  e.g.,  the 
Japanese  medaka  fish  or  frog  embryos,  is  an  attractive  supplement.  Fish  may  be  a  good 
medium  by  which  to  track  a  propensity  for  certain  diseases  such  as  cancer,  but  may  well 
be  useless  for  other  indications,  such  as  air  quality.  Other  biological  markers,  such  as 
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plants,  have  promise.  The  use  of  organism  indicators  sounds  primitive,  but  has  been 
historically  effective.  These  biomonitoring  test  systems  (BTS)  perform  as  interpretors  of 
complex  dosage.  However  it  should  be  noted  and  considered  that  not  all  individual 
biological  markers  are  identical.  Differences  between  individual  organisms,  e.g.,  fish,  may 
well  mask  responses  to  different  contaminant  levels,  producing  a  "noisy  signal 
concerning  current,  or  average,  contaminant  level.  It  is  imperative  that  careful  and 
appropriate  analytical  statistical  tools  be  brought  to  bear  to  guide  acquisition  and 
interpretation  of  data  from  TWR  monitoring.  A  further  issue  is  how  to  combine 
information  from  analyses  of  individual  BTS  (applied  at  different  locations  and  times  at 
a  specified  site). 

C.2.  Decision  Assistance 

A  decision  maker  has  various  options  with  respect  to  a  TWR.  Here  are  some. 

(a)  Leave  the  TWR  alone.  This  may  be  acceptable  in  certain  cases  in  view  of  cost  and 
judged  impact;  see  Travis  (1992).  Or  it  may  reflect  a  priority  scheme  that 
postpones  action  in  favor  of  a  more  pressing  need  elsewhere.  An  assessment 
procedure  that  reliably  and  defensibly  assesses  the  future  effect  seems  necessary. 
Expert  judgement  is  useful  but  not  sufficient. 

(b)  Isolate  or  contain  without  clean  up.  This  may  have  been  done  in  Europe 
(Germany)  with  certain  rivers,  e.g.,  the  Rhine.  A  monitoring  and  assessment 
procedure  seems  essential. 

(c)  Complete  surgical  clean  up  once  and  for  all  by  excavation,  offsite  re-location,  or 
decomposition  of  contents.  Replace  soil.  This  "organ  transplant"  (Oregon 
transplant!)  procedure  may  be  far  too  expensive  in  practice,  but  is  perhaps  an 
ideal.  Once  again,  an  attempt  to  measure  and  quantitatively  assess  the  degree  of 
cleanup  is  desirable. 

(d)  Perform  partial  clean  up  as  in  (c),  then  process  water  that  interacts  with  TRW  and 
reaches  outside,  e.g.,  enters  groundwater  that  is  used  elsewhere,  or  flows  into 
streams  or  rivers.  This  may  be  a  common  option.  Its  effectiveness  may  well 
depend  on  contaminants  present. 

Question:  How  does  the  decision  maker  decide  whether  the  processing  procedure  is 
sufficient;  i.e.,  is  the  permitted  output  clean  enough?  The  answer  to  this  question  must, 
for  political  reasons,  be  defensible  on  a  level  somewhat  comprehensible  to  an  informed 
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attentive  layman.  Desirably,  the  procedure  adopted  must  also  be  legally  defensible  and 
cost  effective.  A  lucid  and  objective  quantitative  approach  seems  essential.  We  are 
continuing  to  actively  review  and  appraise  the  relevant  literature  and  directives,  e.g., 
from  EPA. 

In  what  follows  we  propose  quantitative  attacks,  particularly  on  option  (d),  partial 
clean  up  and  processing  of  effluent. 

C3.  Quantitative  Approaches;  States 

The  decision  maker  is  potentially  able  to  measure  the  current  (time  f)  concentrations 
of  n  individual  chemical  contaminants  at  chosen  sampling  times  t;  call  these 

For  instance  the  first  component  yi(f)  might  be  ppm  of  arsenic  as  measured  at  f  =  15 
days  after  clean  up  begins;  the  last  component,  yn(f),  might  be  a  concentration  of  dioxin. 
It  is  expected  that,  before  treatment,  at  f  =  0,  y/(f),  t  <  0,  will  vary  haphazardly,  possibly 
because  of  variations  attributable  to  season  of  the  year,  basic  water  flow,  temperature, 
age  of  certain  TWR  contents,  and  so  on;  in  other  words  each  concentration  yj(f),  t>  0,  is  a 
time  series,  and  the  collection  of  all  is  a  multi~variable  time  series.  Ideally  we  intend 
and  anticipate  that  the  general  level  (e.g.,  mean)  of  such  time  series  will  decrease  with 
time  measured  from  the  "instant"  when  processing  starts;  presumably  we  also  want 
large  excursions,  or  pulses,  of  contaminant  concentration  to  become  much  less  frequent 
as  treatment  continues.  Note  that  the  above  concentrations  may  realistically  be 
composites  of  measures  taken  at  various  spatial  points  in  an  aquifer,  so  a  more  inclusive 
portrayal  of  reality  is  the  time-space  series  {yijiO,  ti),  where  li  is  the  location  of  the 
observation  j.  The  subsequent  discussion  omits  consideration  of  this  detail. 

In  addition  to  chemicals  we  include  organic  indicators  (biomarkers)  such  as  the 
numbers  of  medaka  livers  containing  tumors  out  of  a  number  exposed  in  a  sample 
and/or  the  result  of  FETAX  assays;  call  the  results  of  these  assays 
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ziit),Z2it),  (2) 

if  there  are  m  such  indicators.  An  important  question  is  the  choice  of  the  number  of 
biomarkers  to  use  and  the  frequency  of  their  use.  Note  that  the  chemical  concentrations 
will  probably  be  modeled  on  "continuous"  random  processes  (possibly,  but  not 
necessarily,  Gaussian),  while  the  counts  will  be  "discrete"  (i.e.  taking  on  values  like  0, 1, 
2, ...,  13/ ...).  Of  course  all  organic  measures  (we  allow  for  m  >  1)  need  not  be  discrete. 
Thus  we  think  of  the  status  ("state")  of  the  system  (TWR)  to  be  given  by  z{t))  at 
time  f,  i.e.  two  collections  of  measurements  or  counts.  An  important  and  practical 
research  question  is  to  specify  a  suitable,  and  cost-effective,  contamination  profile  or  site 

state  vector  (yit),  z(t)). 

C.4.  Quantitative  Assessment  of  Clean-up  Adequacy 

An  (impractical)  ideal  would  be  to  insist  on,  and  attempt  to  guarantee,  no  (zero) 
contamination  by  each  identified  contaminant  after  the  cleanup  project  begins.  This  is 
unrealistic  because,  first,  true  concentrations  are  likely  to  fluctuate  over  time  and  over 
space,  i.e.  with  location  within  and  near  the  site,  and  second,  measured  concentrations,  or 
their  surrogates  such  as  the  number  of  tumor-infected  medaka  livers  in  an  exposed 
group  of  fish  will  not  give  a  totally  noise-free  indication  of  the  true  effective 
concentration  prevailing  at  a  particular  time.  In  other  words,  a  measurement  portrait 
may  very  likely  be  a  somewhat  inaccurate  portrayal  of  a  particular  item  s  true  time- 
space  concentration.  With  this  in  mind  consider  the  following 

Objective:  Set  up  a  simple  index  of  the  TWR's  overall  contamination  condition  at  or 
close  to,  any  time  t  that  accounts  for  the  effects  of  measurement  error,  the  inherent 
variability  of  biological  systems,  and  estimated  true-value  fluctuation  for  each 
(recognized)  contamination  component. 

It  cannot  now  be  argued  that  a  simple  single  index  can  be  devised  that  summarizes 
"site  health"  in  a  totally  satisfactory  and  non-controversial  way.  However,  effort 
should  go  towards  devising  such  an  index,  and  establishing  its  credibility  if  only  to 
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assist  in  communication  and  to  guide  policy  and  decision  makers.  Behind  such  an 
index  should  be  more  specific  measures  of  cleanup  performance,  i.e.  response  to 
cleanup  efforts  targeting  specific  pollutants. 

Some  tentative  suggestions  for  accomplishing  the  objective  are  made  below.  There 
are  various  options  that  have  different  good  and  bad  points,  not  all  of  which  are  well- 
understood.  It  is  proposed  to  lay  out  some  of  these  options  and  to  continue  to  conduct 
research  on  their  relative  merits.  It  is  also  proposed  to  search  for  and  evaluate  other 
options. 

Option  1:  Multiple  Hypothesis  Testing 

A  conventional  way  to  assess  the  effect  of  a  treatment  is  to  choose  a  relevant 
measurable  response  whose  generic  value  is  yiit)  for  the  i*  contaminating  element  at 
time  t,  measure  it  (replicate)  /  times  under  (a)  remediated  and  (b)  control  conditions,  or 
alternatively,  (&'),  with  reference  to  a  tolerable  threshold  yj.  Then  perform  a  classical 
one-sided  hypothesis  test  of  a  suitable  null  hypothesis.  Suppose  a  relevant  test  statistic 
is  denoted  by 

where  y  denotes  a  summary  of  the  /  replicates  mentioned  earlier;  this  summary  can  be 
a  simple  mean,  a  robust  alternative  (e.g.,  median  or  other  M-estimate),  or  a  relevant 
parameter  estimated  by  likelihood  or  Bayes  methodology.  Assume  that  y  responds 
positively  to  the  presence  of  contaminant:  the  greater  the  concentration  of  element  i 
present,  the  greater  would  Ai  tend  to  be.  Then  an  appropriate  null  hypothesis  could  be 
that  Aiit)  is  a  sample  from  a  Normal  distribution  with  zero  mean  and  standard  deviation 
Oj.  A  test  of  the  i*  null  hypothesis  alone  would  be:  reject  it,  i.e.  the  h)^othesis  that 
remediation  has  brought  the  concentration  of  element  i  under  control,  if  dj(f)  >  dj, 
where  the  cutoff  value  dj,  is,  say,  a  95^^  percentile  of  a  normal  distribution  with  mean 
zero  and  standard  deviation  Of,  or  better,  the  corresponding  t-distribution. 
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Unfortunately  if  the  number  of  contaminant  elements  being  tested  for  is  large,  it 
becomes  likely  that  at  least  one  such  test  asserts  a  significant  difference,  namely  that  Aiit) 
>  Ai,  even  if  all  null  hypotheses  are  actually  true.  This  is  the  multiplicity  problem.  A 
way  of  addressing  it  is  as  follows.  A  p- value  associated  with  Aiit)  is  the  probability  that 
any  random  sample  from  the  above  population  exceeds  the  observed/ measured  Aiit)- 
value;  let  pi  be  the  numerical  value  of  the  i‘^  p-value.  Actually,  since  Gi  will  be  unknown 
and  hence  must  be  estimated  the  appropriate  reference  distribution  should  be  a  Student 
t.  Note  that  this  p-value  number  becomes  small  if  the  difference  between  the  responses 
under  remediated  conditions  and  under  control  or  threshold  conditions  is  large, 
indicating  that  remediation  is  not  (yet)  effective.  If  there  are  I  (e.g.,  10)  different 
contaminating  elements  being  tested  for,  then  one  can  assess  for  the  combined  significance 
of  all  elements  by  use  of  the  Fisherian  statistic 

-State).  w) 

i=l 

Under  the  null  hypothesis  of  no  difference,  in  any  element,  between  remediated 
response  and  control  the  above  is  distributed  as  a  c/j/-square  random  variable  with  21 
degrees  of  freedom;  an  observed  value  high  enough  to  exceed  the  95^*^  or  99^^  percentile 
of  such  a  chi-square  distribution  suggests  that,  overall,  the  current  remediation  effort 
has  not  been  successful.  A  useful  informal  supplement  would  be  to  plot  the  individual 
Pi-values  to  see  if  they  appear  to  be  uniformly  distributed  over  [0,1];  if  most  were  close 
to  unity,  but  several  close  to  zero,  the  latter  "several"  would  be  implicated  as  those 
elements  not  yet  affected  by  remediation. 

Note  that  the  above  process  is  suggested  as  an  informal  screening  procedure.  It  has 
various  flaws;  many  are  identified  in  the  NRC  Combination  of  Information  Report  (1992), 
abbreviated  NRC /CL  For  one  thing  the  test  has  no  explicit  dependence  on  sample  size 
for  the  individual  element  summaries;  for  another,  there  is  no  acknowledged 
dependence  on  the  individual  test  statistic  distributions,  nor  on  the  cost  of  either 
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Grron.60usly  a.ccGpting  ths  rGin6d.ia.tion  as  complGtG  whGn  it  is  not  or  of  continuing  with 
thG  Gffort  whGn  it  is  not  roquirod.  For  still  anothor,  thorG  is  no  attompt  to  "borrow 
strGngth"  by  utilizing  information  about  thG  GffGCt  of  the  samG  romodiation  stratGgy  on 
thG  same  contaminating  clomonts  at  othor  toxic  wastG  repositories. 

Option  2:  Maximum  Test  Risk 

For  the  fc*  biomonitoring  or  other  test  system  {k  =  l, K)  determine  the  largest 
dose  level  dk  (e.g.  lowest  groundwater  dilution)  so  that  the  probability  that  the  response 
at  that  dose  is  greater  than  that  for  the  control  is  less  than  a  (small)  number  r.  A  possible 
decision  rule  is  to  declare  the  smallest  value  (lowest  concentration)  mmdk  safe  at 

/C 

maximum  test  risk  level  r. 

III.  RECOMMENDATION 

Further  work,  both  theoretical  and  applied,  is  required  to  put  the  above  ideas  for 
combining  information  into  practice.  It  is  proposed  that  this  work  and  research  into  the 
quantitative  analysis  of  bioassay  data  be  continued. 
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Appendix  A 


A  Simulation  Study  of  the  Behavior  of  Estimators 
of  the  Teratogenic  Index 

Animal  experiments  are  used  to  study  the  effects  of  the  dose  of  a  potential 
toxin.  One  measure  of  the  effect  is  the  dose  which  is  lethal  to  50%  of  the 
population  LD50  =  iihJ.  Another  measure  is  the  dose  which  produces  undesirable 
symptoms  in  50%  of  the  population  ED50  =  jid-  A  combined  measure  of  the 
effect  of  the  toxin  is  the  ratio  y=  I^n/I^D  of  the  two  doses;  such  a  ratio  is  the 
teratogenic  index.  Large  values  of  y»  1  are  of  concern. 

In  this  note  we  use  simulation  to  investigate  the  behavior  of  two  approximate 
expressions  for  the  standard  deviation  oi  y  =  fihJ  /  I^D- 

Two  Approximate  Expressions  for  the  Variance  of  y  and  log  f. 

In  this  section  we  use  the  "delta  method"  to  obtain  an  approximation  for  the 
variance  of  the  ratio  y  =  /  P-d  and  an  approximation  for  the  variance  of  the 

log  ratio  logy.  This  is  a  simple  way  of  combining  standard  errors  from  dose- 
response  (e.g.  probit)  analyses  to  obtain  approximate  standard  errors  and 
confidence  intervals  for  y. 

The  probit  model  is  often  used  to  estimate  and  /xd-  Iu  this  case,  the 
sampling  distribution  of  and  pQ  is  asymptotically  normal.  Hence  we  write 


y  =  Pn I  Pd 


/ Mn) 


1 

1 

J^D 

(1) 
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where  ;Un  and  no  are  the  true  parameter  values  and  ejv  and  eo  are  independent 
normal  random  variables  with  mean  0;  the  last  expression  follows  from  a  partial 
Taylor  expansion  of  {jj.d  +  £d)'^  about  iiq.  Thus 


l^D 


=  7 


I  ,  gp 

I^N  J 


Hence, 


Var 


oij  ^ 
.2  ^  ..2 


where  aj]  (respectively  Oq)  is  the  variance  of  £n  (respectively  cd)-  A  crude  but 
convenient  estimate  of  the  Var  7  is 

r  -2  -21 

(2) 


T  •  A  A  A  2 

Var  7=7 


2  2 
;2  .-.2 


LMn  Mb] 

where  and  Sq  are  the  sample  variances  of  ju^f  and  The  standard  error  of  7 


IS 


(3) 


The  logarithm  of  a  ratio  estimate,  such  as  7,  is  often  more  stable  numerically 
and  often  has  a  more  symmetric  sampling  distribution  than  the  sampling 
distribution  of  the  ratio  itself.  We  next  derive  an  approximate  expression  for  the 
Var  [log  7].  Note  that 

I" - I 

(4) 


log7  =  log/iN-logAtD 


=  log(/l  JV  +  % )  -  log(/^D  +  £D  ) 

1  1 

=  logMN  + - %  -  log/^D  — — 

Mn  MD 

where  the  last  expression  follows  from  the  first  two  terms  of  a  Taylor  expansion 
of  the  log  about  and  fap.  Thus, 
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A  crude  estimate  of  yar[log  y]  is 


Var  [logy]  = 

’2  2  ' 
S/V  ,  sd 
-'2  -*2 
MdJ 

SE[iogr]=y^ 

'sli  1  sp 

Pn  Pd  . 

(7) 


A  Simulation  Experiment. 

0.4  ,  0.3  , 

In  the  following  simulation  experiments  =  -j^  and  sq  =  are  tixed. 

These  particular  numbers  were  obtained  from  a  manuscript  by  F.  Hoffman.  For 
the  replication  of  the  experiment,  a  random  number  (respectively 

fi-oik))  is  drawn  from  a  normal  distribution  with  mean  and  variance  S{^ 
(respectively  jiD  and  s^)-  The  estimates  y(fc)  =  /iN(^)/ P-oi^) 
logy(ic)  =  log/i^;(A:)-log/iD(^)  computed.  The  sample  asymptotic  standard 
deviations  which  are  the  square  roots  of  (1)  and  (2) 


Vy{k)^ 


Pd{^) 


V 


<-2 

sn 


n1/2 


SD 


Mkf  fioiky 


and 


^log  y{^) 


Sn 


-  + 


SD 


|l/2 


are  evaluated.  Approximate  95%  normal  confidence  intervals  are  calculated 


Poik) 


Pn{^) 

PdW 


±{l.96)v^^y{k) 
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The  simulation  is  replicated  500  times  and  the  following  statistics  are 

computed.  The  sample  means  of  the  estimated  ratio  and  the  estimated  log  ratio 
1  500  1  500 


500 


,  500  I  500 

log  r  =  —  X  log  fin  (i)  -  log/iD(<:)  a  —  2  log  m 

500 3UU 

and  the  sample  standard  deviations  of  the  estimated  ratio  and  estimated  log  ratio 

nl/2 


^  y  “ 


499 


<^log  y  ~ 


j^S(>°grW-iogr)‘ 


ll/2 


The  average  length  of  the  confidence  intervals  of  the  ratio  and  log  ratio  are 
computed  where 


with 


Ly=—YLy(k) 

^  500f  ’ 

L^{k)  =  l{\.9e)vy{k) 


and 


^Iog7”5oo2^1og7(^) 


with 

i‘log7(^)  =  2exp{(1.96)i)|ogy(fc)}; 

the  two  endpoints  of  the  confidence  interval  for  log  /  are  exponentiated  to  give 
an  interval  for  y.  Finally  the  fraction  of  intervals  //W  which  cover and  the 
fraction  of  intervals  l\o^-fk)  which  cover  log  are  computed. 

The  results  are  presented  in  Table  1.  Displayed  in  Table  1  are  the  asymptotic 
standard  deviation  in  each  case. 
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MdJ 

and  thG  corrGsponding  sample  standard  deviations  &y  and  ^logy  Comparison  of 
Vy  and  &y  suggests  that  the  approximate  standard  deviation  for  y  is  reasonably 

accurate  as  long  as  jio  is  not  too  close  to  0.  Similarly  comparison  of  viogy  and 
<^logr  suggests  that  the  approximate  variance  for  logy  is  even  more  accurate. 

Histograms  of  the  estimates  [y{k)}  and  {log^^)}  are  displayed  as  Figure  1  for 

the  case  hn  =  h  A^D  =  0-5  in  which  jiD  is  close  to  0.  Note  that  the  histogram  of 
{/(it)}  suggests  that  the  sampling  distribution  of  yis  somewhat  skewed  to  the 
right.  The  histogram  of  {log  y{k)]  appears  more  symmetric.  Confidence  intervals 
based  on  the  normal  distribution  may  not  have  the  advertised  coverage  if  the 
sampling  distribution  of  the  statistic  is  not  symmetric. 

Also  displayed  in  Table  1  are  the  fraction  of  simulation  confidence  intervals 
that  cover  the  true  y  =  jUn/aid  and  the  sample  mean  of  the  lengths  of  the  simulated 
confidence  intervals.  Recall  that  the  endpoints  of  the  simulated  confidence 
intervals  for  log  yare  exponentiated  before  computing  the  length  and  computing 
the  sample  mean.  As  a  result  the  sample  means  of  the  length  of  the  confidence 
interval  for  yand  log  yare  comparable.  The  results  indicate  that  the  coverage  of 
the  confidence  intervals  is  reasonable.  However,  the  average  length  of  the 
interval  can  be  large.  It  is  particularly  large  (~2)  when  //n  =  1/  A^D  =  0-5  and 
y  =  2  =  1  /0.5,  i.e.  when  the  denominator  is  small. 
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Simulation  Study  of  Estimates  of  the  Standard  Deviation 
of  the  Teratogenic  Index 

S^^=0.4/Vl0  SD  =0.3/VTo  r  =  fiN/i^D 


r 

Nbr 

Asym 

Sample 

Asym 

Sample 

Fract  95%  Cl 

jUn/MD 

of 

std  dev 

std  dev 

std  dev 

std  dev 

covering 

Repl. 

for  y 

for  y 

for 

for 

(average  width)* 

logy 

logy 

logy 

Vy 

dy 

Ulogy 

<y\ogy 

7 

1 

0.5 

2 

500 

0.46 

0.51 

0.23 

0.23 

0.94 

0.96 

(1.99) 

(2.07) 

2 

1 

2 

500 

0.23 

0.25 

0.11 

0.12 

0.95 

0.95 

(0.92) 

(0.93) 

1.5 

1 

1.5 

500 

0.19 

0.20 

0.13 

0.13 

0.94 

0.94 

(0.76) 

(0.77) 

3 

1 

3 

500 

0.31 

0.31 

0.10 

0.10 

0.96 

0.96 

(1.25) 

(1.26) 

*  average  width  is  for  a  Cl  for  y  The  endpoints  of  the  Cl  for  log  y  are 
exponentiated. 
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SIMULATION  ESTIMATES  OF  TERATOGENIC  INDEX;MUN  =  1  MUD=0.5 


FIGURE  1 


Appendix  B 


MEGA  MED  AKA  STUDY 
A  Study  of  Tank  Variability 

Medaka  fish  are  exposed  to  different  levels  of  a  (potential)  to?<in.  Each 
treatment  including  a  control  has  Nt  =  4  tanks  allocated  to  it.  Some  of  ths  fish  in 
each  tank  are  sacrificed  at  4,  6,  and  9  months  after  their  initial  exposure.  Their 
livers  are  examined  and  the  number  of  fish  that  have  hepatocellular  neoplasms 
and  /  or  carcinomas  are  recorded. 

The  purpose  of  this  note  is  to  study  the  variability  of  the  tanks  within  a 
treatment  level. 

The  simplest  model  is  that  the  fish  in  all  tanks  within  a  treatment  are  subject 
to  the  same  environment.  If  Xi,eis)  is  the  number  of  fish  in  tank  i  in  treatment  e  at 
time  s  whose  livers  have  neoplasms  or  carcinomas  out  of  the  Ni>(s)  that  were 
sacrificed,  then  the  simplest  model  is  [Xj^is),  f  =  1,  ...,  Nj)  are  independent 
binomial  random  variables;  has  a  binomial  distribution  with  Niseis)  trials 

and  probability  of  occurrence  of  neoplasm  or  carcinoma  pe(s).  For  this  model  the 
maximum  likelihood  estimate  of  pe(s) 

Nj 

Pe(s)  =  7^ - 

i=l 

which  has  asymptotic  variance 
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i=l 

This  model  has  implications  about  how  much  variability  the  results  from  each 
tank  can  display. 

To  study  the  tank  variability  the  following  calculations  are  performed  within 
each  treatment  group  at  each  time  s.  Each  tank  was  left  out  in  turn  and  pe{s)  was 
estimated  using  the  mis)  fish  sacrificed  from  the  remaining  3  tanks.  The 
following  simulation  is  then  conducted.  The  replication  consists  of  the 
following.  A  binomial  random  number  (O)  with  njis)  trials  and  probability  of 
success  Peis)  is  drawn  and  the  fraction  po  =  0/n/(s)  is  computed.  If  the  left  out 
tank  has  no(s)  fish  sacrificed  from  it,  then  the  0.05  and  0.95  percentiles,  qiii)  and 
cjuii),  from  a  binomial  distribution  with  no(s)  trials  and  probability  of  success  po 
are  found.  The  simulation  is  replicated  Nr  times.  A  confidence  interval  for  the 
fraction  of  sacrificed  fish  in  the  left  out  tank  which  exhibit  neoplasms/ 
carcinomas  is 

'n,  n, 

M _  M _ 

Nr  '  Nr 

The  observed  fraction  is  then  compared  to  this  interval.  If  the  model  is  correct, 
then  the  observed  fraction  should  fall  into  the  confidence  interval  the  majority  of 
the  time.  Results  appear  in  Table  1  for  that  part  of  the  experiment  which  uses  fish 
that  are  6  days  of  age  at  the  start  of  the  experiment  and  Table  2  for  that  part  of 
the  experiment  which  uses  fish  that  are  52  days  of  age  at  the  start  of  the 
experiment. 
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The  results  suggest  that  there  is  more  variability  between  tanks  for  that  part 
of  the  experiment  that  involves  fish  that  are  6  days  of  age  at  the  start  of  the  test. 
The  results  suggest  that  when  fitting  parametric  models  to  the  data,  a  variable  for 
tank  effect  be  included  in  the  model. 
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Table  1 


Fish  at  6  days  of  Age  at  Start  of  Test 
(500  replications) 


Sacrifice  Time 

4  months 

6  months 

9  months 

Treat-  Tank 

Conf 

Fraction 

Conf 

Fraction 

Conf 

Fraction 

ment  Left 
(mg/^)  CXit 

Interval  Abnormal 

in  Left  Out 
Tank 

Interval  Abnormal 

in  Left  Out 
Tank 

Interval  Abnormal 

in  Left  Out 
Tank 

Control  1 

[0,0] 

0* 

[0,0] 

0" 

[0.008,0.09] 

0 

2 

[0,0] 

0* 

[0,0] 

0" 

[0,0] 

0.18 

3 

[0,0] 

0* 

[0,0] 

0" 

[0.007,0.08] 

0.08" 

4 

[0,0] 

0‘ 

[0,0] 

0" 

[0.01,0.11] 

0 

2.5  1 

[0.001,0.03] 

0 

[0.04,0.15] 

0.08" 

[0.20,0.40] 

0.19 

2 

[0.001,0.03] 

0 

[0.01,0.08] 

0.24 

[0.20,0.40] 

0.09 

3 

[0,0] 

0.04 

[0.05,0.17] 

0.04 

[0.12,0.31] 

0.42 

4 

[0.001,0.04] 

0 

[0.06,0.19] 

0 

[0.15,0.36] 

0.29* 

5.0  1 

[0.01,0.08] 

0 

[0.13,0.28] 

0.04 

[0.45,0.69] 

0.32 

2 

[0.001,0.03] 

0.08 

[0.10,0.25] 

0.12 

[0.34,0.58] 

0.61 

3 

[0.005,0.06] 

0.04" 

[0.08,0.20] 

0.24 

[0.41,0.64] 

0.43" 

4 

[0.01,0.08] 

0 

[0.07,0.20] 

0.25 

[0.33,0.57] 

0.67 

10.0  1 

[0.04,0.15] 

0.12 

[0.32,0.51] 

0.20 

[0.63,0.85] 

0.75* 

2 

[0.03,0.13] 

0.16 

[0.24,0.42] 

0.44 

[0.62,0.83] 

0.81* 

3 

[0.07,0.20] 

0 

[0.29,0.48] 

0.28 

[0.68,0.88] 

0.64 

4 

[0.04,0.15] 

0.12" 

[0.22,0.39] 

0.52 

[0.63,0.84] 

0.79* 

Fraction  of  fish  with  neoplasms /carcinomas  for  left  out  tank  falls  within  the 
confidence  interval  computed  with  the  other  tanks. 
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Table  2 


Fish  at  52  days  of  Age  at  Start  of  Test 
(500  replications) 


Sacrifice  Time 

4  months 

6  months 

9  months 

Treat-  Tank 

Conf 

Fraction 

Conf 

Fraction 

Conf 

Fraction 

ment  Left 
(mg/^)  Out 

Interval  Abnormal 

in  Left  Out 
Tank 

Interval  Abnormal 

in  Left  Out 
Tank 

Interval  Abnormal 

in  Left  Out 
Tank 

Control  1 

[0.006,0.06] 

0 

[0,0] 

0* 

[0.001,0.04] 

0.04* 

2 

[0.001,0.04] 

0.04* 

[0,0] 

0* 

[0.005,0.061 

0 

3 

[0.005,0.06] 

0 

[0,0] 

0* 

[0.005,0.06] 

0 

4 

[0.001,0.03] 

0.04 

[0,0] 

0* 

[0.001,0.04] 

0.04* 

2.5  1 

[0,0] 

0.04 

[0.005,0.06] 

0.08 

[0.03,0.13] 

0.32 

2 

[0.001,0.03] 

0 

[0.02,0.10] 

0 

[0.10,0.25] 

0.04 

3 

[0.001,0.03] 

0 

[0.02,0.10] 

0 

[0.07,0.21] 

0.13* 

4 

[0.001,0.03] 

0 

[0.005,0.06] 

0.08 

[0.09,0.24] 

0.05 

5.0  1 

[0,0] 

0* 

[0.05,0.17] 

0.16* 

[0.05,0.18] 

0.17* 

2 

[0,0] 

0* 

[0.06,0.18] 

0.12* 

[0.06,0.18] 

0.18* 

3 

[0,0] 

0* 

[0.05,0.17] 

0.16* 

[0.06,0.19] 

0.13* 

4 

[0,0] 

0* 

[0.08,0.21] 

0.04 

[0.09,0.23] 

0.04 

10.0  1 

[0.01,0.08] 

0 

[0.12,0.26] 

0.12* 

[0.20,0.38] 

0.33* 

2 

[0.005,0.06] 

0.04* 

[0.11,0.25] 

0.16* 

[0.26,0.45] 

0.14 

3 

[0.006,0.06] 

0.04* 

[0.10,0.24] 

0.16* 

[0.16,0.34] 

0.43 

4 

[0.005,0.06] 

0.04* 

[0.08,0.21] 

0.24 

[0.22,0.41] 

0.27* 

20.0  1 

[0.05,0.17] 

0 

[0.24,0.42] 

0.24* 

[0.51,0.71] 

0.59* 

2 

[0.03,0.13] 

0.08* 

[0.20,0.37] 

0.40 

[0.48,0.70] 

0.65 

3 

[0.02,0.10] 

0.17 

[0.21,038] 

0.36* 

[0.53,0.73] 

0.55* 

4 

[0.03,0.13] 

0.08* 

[0.25,0.43] 

0.24 

[0.50,0.70] 

0.63 

*  Fraction  of  fish  with  neoplasms /carcinomas  for  left  out  tank  falls  within  the 
confidence  interval  computed  with  the  other  tanks. 
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